{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1r5MH4aIzxT4cNOrEetH0ZvyTTY0UQRNP)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## AIMon's LlamaIndex Extension for LLM Response Evaluation\n",
    "\n",
    "This notebook introduces AIMon's evaluators for the LlamaIndex framework, which are designed to assess the quality and accuracy of responses generated by language models (LLMs) integrated into LlamaIndex. Below is an overview of all available evaluators:\n",
    "\n",
    "- **Hallucination Evaluator**: Detects when a model generates information not supported by the provided context (hallucinations).\n",
    "- **Guideline Evaluator**: Ensures model responses follow predefined instructions and guidelines.\n",
    "- **Completeness Evaluator**: Checks whether the response fully addresses all aspects of the query or task.\n",
    "- **Conciseness Evaluator**: Evaluates if the response is brief yet complete, avoiding unnecessary verbosity.\n",
    "- **Toxicity Evaluator**: Flags harmful, offensive, or inappropriate language in the response.\n",
    "- **Context Relevance Evaluator**: Assesses the relevance and accuracy of the provided context in supporting the model's response.\n",
    "\n",
    "In this notebook, we will focus on utilizing the **Hallucination Evaluator**, **Guideline Evaluator**, and **Context Relevance Evaluator** to evaluate your RAG (Retrieval-Augmented Generation) applications.\n",
    "\n",
    "To learn more about AIMon, check out these resources:: [Website](https://www.aimon.ai/) and [Documentation](https://docs.aimon.ai/)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prerequisites"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's get started by installing the dependencies and setting up the API keys."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%capture\n",
    "!pip install requests datasets aimon-llamaindex llama-index-embeddings-openai llama-index-llms-openai"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Configure your `OPENAI_API_KEY` and `AIMON_API_KEY` in Google Collab secrets and provide them notebook access. We will use OpenAI for the LLM and embedding generation models. We will use AIMon for continuous monitoring of quality issues.\n",
    "\n",
    "AIMon API key can be obtained [here](https://docs.aimon.ai/quickstart#1-api-key)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import json\n",
    "\n",
    "# Import Colab Secrets userdata module.\n",
    "from google.colab import userdata\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = userdata.get(\"OPENAI_API_KEY\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Dataset for evaluation\n",
    "\n",
    "In this example, we will be using the transcripts from MeetingBank dataset [1] as our contextual information."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%capture\n",
    "from datasets import load_dataset\n",
    "\n",
    "meetingbank = load_dataset(\"huuuyeah/meetingbank\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This function helps extract transcripts and converts them into a list of objects of type `llama_index.core.Document`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import Document\n",
    "\n",
    "\n",
    "def extract_and_create_documents(transcripts):\n",
    "    documents = []\n",
    "\n",
    "    for transcript in transcripts:\n",
    "        try:\n",
    "            doc = Document(text=transcript)\n",
    "            documents.append(doc)\n",
    "\n",
    "        except Exception as e:\n",
    "            print(f\"Failed to create document\")\n",
    "\n",
    "    return documents\n",
    "\n",
    "\n",
    "transcripts = [meeting[\"transcript\"] for meeting in meetingbank[\"train\"]]\n",
    "documents = extract_and_create_documents(\n",
    "    transcripts[:5]\n",
    ")  ## Using only 5 transcripts to keep this example fast and concise."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Set up an embedding model. We will be using the `text-embedding-3-small` model here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.embeddings.openai import OpenAIEmbedding\n",
    "\n",
    "embedding_model = OpenAIEmbedding(\n",
    "    model=\"text-embedding-3-small\", embed_batch_size=100, max_retries=3\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Split documents into nodes and generate their embeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from aimon_llamaindex import generate_embeddings_for_docs\n",
    "\n",
    "nodes = generate_embeddings_for_docs(documents, embedding_model)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Insert the nodes with embeddings into in-memory Vector Store Index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from aimon_llamaindex import build_index\n",
    "\n",
    "index = build_index(nodes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Instantiate a Vector Index Retrieiver"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from aimon_llamaindex import build_retriever\n",
    "\n",
    "retriever = build_retriever(index, similarity_top_k=5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Building the LLM Application"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Configure the Large Language Model. Here we choose OpenAI's `gpt-4o-mini` model with temperature setting of 0.1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## OpenAI's LLM\n",
    "from llama_index.llms.openai import OpenAI\n",
    "\n",
    "llm = OpenAI(\n",
    "    model=\"gpt-4o-mini\",\n",
    "    temperature=0.4,\n",
    "    system_prompt=\"\"\"\n",
    "                    Please be professional and polite.\n",
    "                    Answer the user's question in a single line.\n",
    "                    Even if the context lacks information to answer the question, make\n",
    "                    sure that you answer the user's question based on your own knowledge.\n",
    "                    \"\"\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Define your query and instructions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "user_query = \"Which council bills were amended for zoning regulations?\"\n",
    "user_instructions = [\n",
    "    \"Keep the response concise, preferably under the 100 word limit.\"\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Update the LLM's system prompt with the user's instructions defined dynamically"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "llm.system_prompt += (\n",
    "    f\"Please comply to the following instructions {user_instructions}.\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Retrieve a response for the query."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from aimon_llamaindex import get_response\n",
    "\n",
    "llm_response = get_response(user_query, retriever, llm)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Running Evaluations using AIMon"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Configure AIMon Client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from aimon import Client\n",
    "\n",
    "aimon_client = Client(\n",
    "    auth_header=\"Bearer {}\".format(userdata.get(\"AIMON_API_KEY\"))\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using AIMon’s Instruction Adherence Model (a.k.a. Guideline Evaluator)\n",
    "\n",
    "This model evaluates if generated text adheres to given instructions, ensuring that LLMs follow the user’s guidelines and intent across various tasks for more accurate and relevant outputs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from aimon_llamaindex.evaluators import GuidelineEvaluator\n",
    "\n",
    "guideline_evaluator = GuidelineEvaluator(aimon_client)\n",
    "evaluation_result = guideline_evaluator.evaluate(\n",
    "    user_query, llm_response, user_instructions\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"extractions\": [],\n",
      "    \"instructions_list\": [\n",
      "        {\n",
      "            \"explanation\": \"\",\n",
      "            \"follow_probability\": 0.982,\n",
      "            \"instruction\": \"Keep the response concise, preferably under the 100 word limit.\",\n",
      "            \"label\": true\n",
      "        }\n",
      "    ],\n",
      "    \"score\": 1.0\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "print(json.dumps(evaluation_result, indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using AIMon’s Hallucination Detection Evaluator Model (HDM-2)\n",
    "\n",
    "AIMon’s HDM-2 detects hallucinated content in LLM outputs. It provides a “hallucination score” (0.0–1.0) quantifying the likelihood of factual inaccuracies or fabricated information, ensuring more reliable and accurate responses."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from aimon_llamaindex.evaluators import HallucinationEvaluator\n",
    "\n",
    "hallucination_evaluator = HallucinationEvaluator(aimon_client)\n",
    "evalution_result = hallucination_evaluator.evaluate(user_query, llm_response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"is_hallucinated\": \"False\",\n",
      "    \"score\": 0.22446,\n",
      "    \"sentences\": [\n",
      "        {\n",
      "            \"score\": 0.22446,\n",
      "            \"text\": \"The council bills amended for zoning regulations include the small lot moratorium and the text amendment related to off-street parking exemptions for preexisting small lots. These amendments aim to balance the interests of local neighborhoods, health institutions, and developers.\"\n",
      "        }\n",
      "    ]\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "## Printing the initial evaluation result for Hallucination\n",
    "print(json.dumps(evalution_result, indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using AIMon's Context Relevance Evaluator to evaluate the relevance of context data used by the LLM to generate the response."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from aimon_llamaindex.evaluators import ContextRelevanceEvaluator\n",
    "\n",
    "evaluator = ContextRelevanceEvaluator(aimon_client)\n",
    "task_definition = (\n",
    "    \"Find the relevance of the context data used to generate this response.\"\n",
    ")\n",
    "evaluation_result = evaluator.evaluate(\n",
    "    user_query, llm_response, task_definition\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[\n",
      "    {\n",
      "        \"explanations\": [\n",
      "            \"Document 1 discusses a council bill related to zoning regulations, specifically mentioning a text amendment that aims to balance neighborhood interests with developer needs. However, it primarily focuses on parking issues and personal experiences rather than detailing specific zoning regulation amendments or the council bills directly related to them, which makes it less relevant to the query.\",\n",
      "            \"2. Document 2 mentions zoning and development issues, including the need for mass transit and affordability, but it does not provide specific information on which council bills were amended for zoning regulations. The discussion is more about general concerns regarding development and transportation rather than direct references to zoning amendments.\",\n",
      "            \"3. Document 3 touches on zoning laws and amendments but does not specify which council bills were amended for zoning regulations. While it discusses the context of zoning and housing, it lacks concrete details that directly answer the query about specific bills.\",\n",
      "            \"4. Document 4 discusses broader issues about affordable housing and transportation without directly addressing any specific council bills or amendments related to zoning regulations. The focus is on general priorities and funding rather than specific legislative changes, making it less relevant to the query.\",\n",
      "            \"5. Document 5 mentions support for a zoning code amendment regarding parking exemptions for small lots, which is somewhat related to zoning regulations. However, it does not provide specific details about the council bills amended for zoning regulations, thus failing to fully address the query.\"\n",
      "        ],\n",
      "        \"query\": \"Which council bills were amended for zoning regulations?\",\n",
      "        \"relevance_scores\": [\n",
      "            40.5,\n",
      "            40.25,\n",
      "            44.25,\n",
      "            38.5,\n",
      "            43.0\n",
      "        ]\n",
      "    }\n",
      "]\n"
     ]
    }
   ],
   "source": [
    "print(json.dumps(evaluation_result, indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "\n",
    "In this notebook, we built a simple RAG application using the LlamaIndex framework. After retrieving a response to a query, we assessed it with AIMon’s evaluators."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[1]. Y. Hu, T. Ganter, H. Deilamsalehy, F. Dernoncourt, H. Foroosh, and F. Liu, \"MeetingBank: A Benchmark Dataset for Meeting Summarization,\" arXiv, May 2023. [Online]. Available: https://arxiv.org/abs/2305.17529. Accessed: Jan. 16, 2025."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
